I’m creating Rmds for each of the datastreams used in this project. TEROS data are relatively simple to ingest because they’re pulled from GDrive which is already formatted. However, because there are so many TEROS sensors, there’s a significant amount of quality control to document and justify.
We deployed a total of XX Teros sensors across the plots as part of the intial TEMPEST setup. All three plots are outfitted with similar TEROS deployments, where many different grid cells have TEROS sensors at 15 cm, and a subset of grid cells have sensors at 5/15/30 cm. It gets a little tricky to include all sensors since some sensors are intermittent, so there’s some significant QC efforts that go into data prep. First, we will be looking both at 5-minute and 15-minute data, where 15-minute data are the standard (collected year-round) and 5-minute data are only collected in proximity to treatment events. Initial examination says there are some differences between these datasets, so we’ll start by considering both.
First, let’s look at the initial datasets, and see what kinds of issues we’ll need to address with our TEROS QC process:
Per prior work (see earlier Teros rework script referenced above), we know that some of the problems are with TEROS sensors that only have 15-cm depths (which is many of the grid-cells). Since we are interested in depth-resolved, and because we have limited spatial resolution for DO and redox, let’s first remove those to reduce the complexity of our dataset:
Based on QC Step 1 above, we’re going to be using the 15-minute datasets. Next up, there are two sensors in Control that don’t match any of the other sensors, and are reading unreasonably high VWC values. Let’s strip them out.
Next, we need to check if there are sensors that are missing a lot of data, which should not be gap-filled. We’ll do this simply by counting the number of records for a given period, and identifying if there are sensors with large gaps we can afford to lose:
Next up is identifying any sensors that are intermittent, and figuring out if/how to gap-fill. We can see that, of the sensors included, only a couple are missing more than 1% of their time-series. That’s good news!
## [1] "199 NAs present"
Gap lengths are generally not very long, though a gap length >10 is equal to 2.5 hours, which is a little worrying
Welp. It’s clear that there are some gaps during the event, so we’ll need to keep an eye on our gap-filling and see if it influences our dataset in any noticable or meaningful way. The good news is, we’ll be averaging between multiple sensors for each combination of depth and plot, so it is unlikely any given final value will be determined exclusively from gap-filled data.
As a final step, now that we’ve cleaned and gap-filled our data, we need to decide if we want to use the mean or median of sensors to calculate our final TEROS time-series. We’ll look at all three variables to make this decision (and check that we don’t need additional QC for them